Stochastic Kronecker Graph on Vertex-Centric BSP 

Ernest Ryu* and Sean Choi"!" 

Institute for Computational and Mathematical Engineering, Stanford University 
^Department of Computer Science, Stanford University 



Abstract 

Recently Stochastic Kronecker Graph (SKG), a net- 
work generation model, and vertex-centric BSP, a 
graph processing framework like Pregel, have at- 
tracted much attention in the network analysis com- 
munity Unfortunately the two are not very well- 
suited for each other and thus an implementation 
of SKG on vertex-centric BSP must either be done 
serially or in an unnatural manner. 

In this paper, we present a new network genera- 
tion model, which we call Poisson Stochastic Kro- 
necker Graph (PSKG), that generate edges accord- 
ing to the Poisson distribution. The advantage of 
PSKG is that it is easily parallelizable on vertex- 
centric BSP, requires no communication between 
computational nodes, and yet retains all the desired 
properties of SKG. [] 



which is to generate edges in parallel, is not "vertex- 
centric" in nature and therefore is unnatural to pro- 
gram and runs inefficiently in vertex-centric BSP. 

Therefore we present a new network generation 
model, which we call Poisson Stochastic Kronecker 
Graph (PSKG), as an alternative. In this model, the 
out-degree of each vertex is determined by indepen- 
dent but non-identical Poisson random variables and 
the destination node of the edges are determined in 
a recursive manner similar to SKG. 

The resulting algorithm, PSKG, is essentially 
equivalent to SKG and will therefore retain all the 
desired properties of it. Unlike SKG, however, 
PSKG is embarrassingly parallel in a vertex-centric 
manner and therefore is very well-suited for vertex- 
centric BSP. 



1 Introduction 



With the advent of massive real-world network data 
and the computation power to process them, net- 
work analysis is becoming a major topic of scientific 
research. As approaches to model real-world net- 
works, Stochastic Kronecker Graph (SKG) \7\ and 
its predecessor R-MAT |5| have attracted interest 
in the network analysis community due to its sim- 
plicity and its ability to capture many properties 
of real- world networks. As a programming model 
to process large graphs, vertex-centric BSP, such as 
Pregel (§1, Apache Giraph (2j, GPS (9), and Apache 
Hama [3 , has become increasingly popular as an 
alternative to Map Reduce and Hadoop, which are 
ill-suited to run massive scale graph algorithms (8J. 

The two, however, are not well-suited for each 
other. The obvious approach of parallelizing SKG, 
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Part of this work was done while the second author was 
visiting Linkedln. 



Nodes 



Nodes 





■ 1 1 — 

1 Poo 1 Pio 1 






1 — -1 1 


Pio 




! Poi 1 pu 1 




Poo 


L - - L - _ L - 






l Poi I 


Pu 








Poi 


i Pu 





Figure 1: Illustration of SKG. At each of the k steps 
a sub-region is chosen with probability pij. 



1 



2 Theory and Algorithm 

The main result of this paper, PSKG, is presented in 
Algorithm [3| We shall, however, start by discussing 
the original SKG and an equivalent formulation of it 
as this path will motivate PSKG. We then present 
the main algorithm. We conclude this section by 
discussing issues with load balancing. 

Remark. Throughout this paper we shall use zero- 
based indexing for vectors and matrices. 

2.1 Stochastic Kronecker Graph 

Consider the problem of generating a random graph 
of E edges and N — n k vertices, where k G N. Let 



P0,n-1 



jPn-1,0 ' ' ' Pn-l,n-l_ 

be the "initiator matrix" where pij > and 

The approach in SKG is to start off with an empty 
adjacency matrix and "drop" edges into the matrix 
one at a time. Each edge chooses one of the n x n 
partitions with probability pij respectively. The 
chosen partition is again subdivided into smaller 
partitions, and the procedure is repeated k times 
until we reach a single cell of the N x N adjacency 
matrix and place an edge. Figure [j] illustrates the 
idea and Algorithm [T] makes it concrete. 

Algorithm 1 SKG 

for i = 1, • • • E do 

u = v = 

for j = 1, • • • k do 

With probability p rs choose subregion (r, s) 

u — nu + r; v = nv + s 
end for 
Add edge (u, v) 
end for 

Let P k = P ® P <g> • • • <g> P be the fc-th Kronecker 
power of P. Then we can interpret SKG as gener- 
ating m edges independentl)^] where any given edge 
is (u,v) with probability (Pk)uv Now we can apply 
B ayes' rule. 



P(edge is (u,v)) = (Pjfcl) u 



(Pkl) 



(1) 



P (destination is u)P (source is v\ destination is u) 



1 The edge generations are not quite independent due to 
possible "collisions." 



The decomposition ([!]) permits us to choose the 
source node first and then the destination node 
rather than simultaneously and we do this with a 
recursive algorithm to avoid the explicit construc- 
tion of P^. Let 
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and we arrive at Algorithm [2] which is equivalent to 
the original formulation of SKG. 

Algorithm 2 Equivalent SKG 
for i = 1 , • • • E do 

//Select source node u 
u = 

for j = 1, • • • k do 

With probability U r choose subregion r 

u — nu + r 
end for 

/ /Select destination node v 
v = 0; z = u 
for j = 1, • • • k do 
/ — mod(z, n) 

With probability V/ s choose subregion s 
v = nv + s; z — z/n (integer division) 

end for 

Add edge (u, 
end for 

Here we note that the source node selection proce- 
dure is (approximately) a multinomial random vari- 
able with parameters E and where is the 
k-th Kronecker power of U. 

2.2 Poisson Stochastic Kronecker Graph 

Due to the following elementary result [i] we can re- 
place the source node selection procedure, a multi- 
nomial random variable, with i.i.d. Poisson random 
variables. 

Lemma. Let Xi,- — X s be independent Poisson 
random variables each with mean ap\, • • • ap S7 where 
a > ; pi, • • -p s > ; and J2i=iPi = 1- Then 



P[X 1 =x 1 ,-"X a 
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i.e. conditioned on the sum Xl, • • • X s is distributed 
as a multinomial distribution. 

We are finally ready to state the main algorithm 
of this paper. Let E be the expected number of total 
edges while P, (7, V, k are defined the same as before. 



Algorithm 3 PSKG 

Scatter E, P, U, V, k 
for Each vertex u do 

//Determine out-degree of u 

p = 1; z = u 

for j = 1, • • • k do 

/ — mod(z, n)] p — pU\ 
z — z/n (integer division) 
end for 

Generate X ~ Poisson(£/p) 

//For each edge determine destination vertex 

for i = 1 , • • • X do 

u = 0; 2 = i£ 

for j = 1, • • • k do 
/ — mod(z, n) 

With probability V/ s choose subregion s 
v = nv + s; z = z/n (integer division) 
end for 
Add edge (u, v) 
end for 
end for 



PSKG will retain all the desired properties of SKG 
graphs. Specifically, say there is a desired property 
observed by SKG graphs of all sizes with probability 
1 — e. Then the Poisson SKG graphs will also have 
the desired property with probability 1 — e by the 
following lemma. 

Lemma. Let A be an event that occurs with prob- 
ability 1 — e for SKG graphs of all sizes. Then A 
will also hold with probability 1 — e for Poisson SKG 
graphs as well. 



2.3 Probabilistic Load Balancing 

In vertex-centric BSP, where each computational 
worker takes ownership to vertices and their out- 
going edges, it is not a priori clear how to distribute 
them; some vertices have more neighbors than oth- 
ers so assigning an equal number to each worker will 
likely result in load imbalance. As the storage re- 
quirement of the graph structure is proportional to 
the number of neighbors, we shall discuss load bal- 
ancing with the goal of distributing the number of 
edges equally. 

Let N w denote the total number of workers to 
balance the load among and = 0, 1, • • • (N w — 1) 
denote the individual processor number. Let 
be the k-th Kronecker power of U and (UW) U the 
expected load proportion for vertex u. Now we split 
the set of vertices into contiguous partitions (con- 
tiguous by node numbering) so that each partition 
has a total load of about 1/N W and is owned by one 
processor. This procedure is illustrated in Figure [2] 

To do this partitioning efficiently, however, one 
must avoid explicitly forming U^. Algorithm 
achieves this by traversing the decision tree without 
explicitly forming it. 

One legitimate concern of this strategy is that the 
load is only balanced in expectation and therefore 
it is possible that with bad luck the actual load 
is highly unbalanced. However, Theorem [T] tells 
us that with high probability the load imbalance is 
small. 

Theorem 1. The a-level confidence interval of 
the maximum load over all computational nodes is 
[#,#+<*] where 



2E 



S = \l jjr- Vlog^w + |log |log(l - a) 



where j//- is the load under perfect balance. We can 
interpret 5 as the degree of load imbalance. 



Proof. 



* Poisson 

(A) = E 



P A 



Multinomial 



rn 
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(A\m)\ > E[l -s] = 1 -e 



Proof. We first make the assumption that the ex- 
pected load, E, is split and distributed perfectly, 
i.e. each worker will have a load of X{ where 
Xo, • • • Xjv w -i are i.i.d. Poisson random variables 
with mean E/N w . Let M be the upper bound of 



2 The algorithm is a simplified version specifically for n 
□ 2. The generalization to arbitrary n is straightforward. 
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17% 30% 44% 55% 69% 80% 91% 100% 



Figure 2: An example of load balancing where n = 2, p\ = 0.55, p2 = 0.45, k = 3, and N w = 4. The second 
to last line denotes the load of each vertex and the last line denotes the cumulative load. The 4 processors 
attempt to take 25% of the total load each. Consequently processor number 0, 1, 2, and 3 takes ownership 
to the blue, green, purple, and red vertices, respectively. 



the confidence interval. 



P ^maxX; < M^j = P (X\ < M) 

= Fj^ w (M) < I- a 
M < F' 1 ((1 - a) 1 ^ 



F?[ 1 + l0g(1 " a) 



E 



1 + 



log(l - a) 



+ 



N w V N w ) N w 



(1 — a) 1 /^ is approximated by its Taylor series and 
X is approximated by a normal random variable 
J\f(E/N w ,E/N w ) given by the central limit theo- 
rem. AbramowitzfiJ provides the following bound. 

/•oo 

y/2ir(l-$(x)) = e- t2 ' 2 dt<2e- x2 / 2 for x > 

Using the fact that for non-increasing functions / < 
g implies / _1 < g _1 we arrive at the following. 



$ _1 (l-x) < V / 21ogx 
Putting these results together gives the theorem □ 



Algorithm 4 Load Balancing 



for Each worker do 

How Wid/N w ; r u P O id + 1)/AT W 

6 = 0; //lower bound 

Grange = 1; //probability range 

^low — //lower vertex id 

for i = 1, • • • k do 

if How < & + grange then 

grange — PPrsnage'i ^low = 2t£i ow 

else 

6 = 6 + grange 

grange = ( 1 — P)Prange 5 ^low ^ u \ow H" 1 

end if 
end for 

Repeat above with i/ up 
Claim ownership to nodes u\ ow to u up 
end for 



3 Experimental Results 

In this section, we demonstrate that PSKG and SKG 
generate graphs with essentially the same proper- 
ties. As SKG models real world networks well |7| 
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the equivalence between PSKG and SKG implies the 
modeling power of PSKG. 

To generate and analyze the SKG and PSKG 
graphs the SNAP library [To] and our own imple- 
mentation of of PSKG on Apache Giraph |6J were 
used, respectively. 

3.1 Graph Patterns 

There are several standard graph patterns that are 
used to compare the similarity between networks. In 
this paper, we shall use the following patterns: de- 
gree distribution, hop plot, scree plot, and network 
values. These choices are motivated by Leskovec's 
[7] work. 

Degree distribution: The histogram of the nodes' 
degrees with exponential binning. 

Hop plot: Number of reachable pairs r(h) within 
h hops, as a function of the number of hops h. 

Scree plot: Singular values of the graph adjacency 
matrix versus their rank. 

Network values: Distribution of the principal 
eigenvector components versus their rank. 

Figure [3] and Figure [4] compares the graph pat- 
terns of SKG and PSKG. It is clear that the results 
are essentially the same. 

4 Conclusion 

In conclusion, PSKG is a network generation model 
that is more efficient than and yet as powerful as 
SKG. Section 2 and 3 each provide theoretical and 
empirical evidence to this statement. 

One promising direction of future work is vertex- 
centric algorithms for model estimation. There has 
been much work on SKG model fitting, which should 
directly apply to PSKG, but most do not concern 
vertex-centric parallelism. It would be interesting 
to see the efficiency a vertex-centric distributed fit- 
ting algorithm can achieve compared to a serial or 
MapReduce implementation. 
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(a) Degree distribution 
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Figure 3: Graph patterns with parameters n = 2, k = 12, E = 11400, and P = [0.4532, 0.2622; 0.2622, 0.0225] 





Figure 4: Graph patterns with parameters n — 4, k = 8, E — 263546, and P — 
[a, a, a, a; a, a, /3, /3; a, /3, a, (3; a, /3, /3, a] where a = 0.0861 and (3 = 0.0231. P is the adjacency matrix 
of a star graph on 4 nodes (center + 3 satellites) with the l's are replaced with a and the 0's are replaced 
with /3. 
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